# Expected length of a binary pattern occurring in a run.
Suppose we have a fair coin that gives either head (H) or tail (T) with equal probability, and we fix a finite heads-tail sequence $X=X_{1}X_{2}\cdots X_{k}$ of length $k$, where $X_{i} \in \{H,T\}$, what is the expected length of tosses of this fair coin until we see the pattern $X$?
For example if the sequence is $HH$, then what is the expected number of tosses in a row until we finally see a consecutive $HH$? How about $HT$? Do you have a guess which pattern would we expect to see first? Let us compute this.
Let $x = E(L)$ to be the expected length of seeing $HH$.
Now, if the first toss is $T$, then essentially we restart the process, but with one additional toss added. This happens half the time, so we have $$
x = E(L) = \frac{1}{2} (1 + x) + \cdots
$$For the other half of the time, we start with the toss $H$. Now if the following toss is $H$, then we win, but if the following toss is $T$, then it essentially restarts the process. So $$
E(L) = \frac{1}{2} (1 + x) + \frac{1}{2}\left( \frac{1}{2} \cdot 2 + \frac{1}{2}(2+ x)\right)
$$Solving for $E(L)$ gives $$
E(L) = 6.
$$
What about $HT$? Again let us set $x = E(L)$.
Suppose the first toss is $T$, then this restarts the waiting process, so we have $$
x = E(L) = \frac{1}{2} (1 + x) + \cdots
$$And if the first toss is $H$, two things happens afterwards. Either we follow up with $T$, which then we win, or we toss $H$, which again we will need to wait for $T$ again. So given the first toss is a head, we now become a geometric distribution $Y\sim Geom(\frac{1}{2})$ waiting for a $T$. Hence $$
x = E(L) = \frac{1}{2}(1+x) + \frac{1}{2}(1 + E(Y)).
$$Since $E(Y) = 2$, we can solve for $x = E(L)$, giving us $$
E(L) = 4.
$$
This is interesting! As these two patterns have unequal expected waiting lengths!
Let us populate these first few patterns and their expected waiting lengths: $$
\begin{array}{}
\text{pattern} & \text{expected length} \\ \hline
H & 2 \\
T & 2 \\
HH & 6 \\
HT & 4 \\
TH & 4 \\
TT & 6
\end{array}
$$We can get the other ones by using symmetry. And the expected length of a length one pattern is just the expected value of a geometric variable $Y \sim Geom(\frac{1}{2})$.
This gives rise to an interesting question. Given some pattern $X$, what is $f(X)$ the expected waiting length of seeing pattern $X$?
We can employ the same method as we had to compute these.
For example, let us compute $x = f(HHH)$.
If the first toss is $T$, then we end up restarting the process, so we have $$
x = f(HHH) = \frac{1}{2}(1+x) + \cdots
$$
If instead we start with $HT$, then we also restart the process. So we have $$
x = f(HHH) = \frac{1}{2} (1 + x) + \frac{1}{4} (2 + x) + \cdots
$$
If instead we start with $HH$, it leads to $HHH$ or $HHT$. The first case we win, while the second case we restart. So we have $$
x = f(HHH) = \frac{1}{2} (1 + x) + \frac{1}{4}(2 + x) + \frac{1}{8}(3) + \frac{1}{8}( 3 + x)
$$
Solving for $x = f(HHH)$ gives $$
f(HHH) = 14.
$$
## A graphical analysis.
Suppose we a pattern $HTHHT$ that we want to find the expected length to see this pattern from fair coin tosses. We construct this binary tree
```
x
/ \
H(y) x
/ \
y T(z)
/ \
H x
/ \
H z
/ \
y T
```
This shows that if we set $x = E(L)$ the expected length of the run to see the pattern $HTHHT$, we have the system of equation $$
\begin{align*}
x &= \frac{1}{2}(1+y) + \frac{1}{2}(1+x) \\
y &= \frac{1}{2}(1+y) + \frac{1}{2}(1+z) \\
z &= \frac{1}{2}(1+x) + \frac{1}{4}(2+z) + \frac{1}{8}(3) + \frac{1}{8}(3+y)
\end{align*}
$$
This gives $x = E(L) = 36$.
Another example. Suppose the pattern is $HTHTH$, then we have the following binary tree instead.
```
x
/ \
H(y) x
/ \
y T
/ \
H x
/ \
y T
/ \
H x
```
Which gives the following system of equations $$
\begin{align*}
x &= \frac{1}{2}(1+y) + \frac{1}{2}(1+x) \\
y &= \frac{1}{2}(1+y) + \frac{1}{4}(2+x) + \frac{1}{8}(3 + y) + \frac{1}{16}(4) + \frac{1}{16}(4+x)
\end{align*}
$$
This gives $x = E(L) = 42$.
A Python3 code to compute this, with the help of SymPy to solve a linear system is given below:
```
def matching(chk, tar):
temp = chk
done = tar.startswith(temp)
while not done:
temp = temp[1:]
done = tar.startswith(temp)
return len(temp)
def seq(pattern):
length = len(pattern)
ret = []
for i in range(length):
wh = pattern[:i] + 'H'
wt = pattern[:i] + 'T'
ret.append([matching(wh,pattern), matching(wt,pattern)])
return ret
from sympy.matrices import Matrix, eye
def expected_length(pattern):
arr = []
b = []
size = len(pattern)
for v in seq(pattern):
arr += [0+(i in v) for i in range(size)]
b += [1]
b = Matrix(size,1,b)
return int((eye(size) - 1/2 * Matrix(size,size,arr)).LUsolve(b)[0])
# This function computes the expected length to see a pattern in a coin toss run
expected_length('HTTH') # 18
expected_length('HTTTHHTHH') # 514
```
Is there a better way?